Initial LMBuddy class for running jobs #84

sfriedowitz · 2024-03-14T19:54:48Z

What's changing

Removes the run_job method in favor of a class LMBuddy that has methods for finetune and evaluate
Implements job-result data structures for returning data generated by job entrypoints, instead of simply having the entrypoints write to disk/W&B
Implements a new LoadableAssetPath type and associated data structures to represent any load_from path for a HF asset. See inline comments for motivation for this change.

Note that the CLI API is not changed by these internal changes, so you can still execute the package as a Ray entrypoint in the same manner as before.

How to test it

Run the test suite
Pull the branch and try running some jobs with the new interface

Related Jira Ticket

Additional notes for reviewers

In follow-up PRs into this dev branch, I would like to do the following:

src/lm_buddy/integrations/huggingface/dataset_config.py

src/lm_buddy/integrations/wandb/artifact_utils.py

src/lm_buddy/buddy.py

examples/configs/evaluation/lm_harness_hf_config.yaml

sfriedowitz · 2024-03-14T21:40:01Z

examples/configs/evaluation/lm_harness_quantized_config.yaml

This is a duplicate of the other hf_config.yaml file since it already has the quantization section specified.

src/lm_buddy/integrations/huggingface/asset_loader.py

sfriedowitz · 2024-03-14T21:44:43Z

src/lm_buddy/jobs/_entrypoints/finetuning.py

-            )
-            print("Logging artifact for model checkpoint...")
-            artifact_loader.log_artifact(model_artifact)
+    ckpt_path, artifact_config = None, None


In a follow-up PR (https://mzai.atlassian.net/browse/RD2024-152), I would like to refactor a bit how we are generating artifacts and results in these methods.

The issue is that because the tracking field is optional, we repeatedly end up having to write two code branches, (1) for when we initialize a W&B run and create an artifact and (2) for when we run the job without tracking or artifacts. It's almost like we want something like maybe_initialize_wandb_run that handles the optionality of the tracking service.

src/lm_buddy/jobs/_entrypoints/lm_harness.py

src/lm_buddy/buddy.py

veekaybee · 2024-03-15T00:56:19Z

Looking now! Pulled the branch, reading the instructions for direct_job_execution.ipynb and have put together a a script like this. Where do we specify the cluster information in the new workflow? Seems like it should be somewhere here right? FinetuningRayConfig

from ray.job_submission import JobSubmissionClient
from pathlib import Path


from lm_buddy import LMBuddy
from lm_buddy.jobs.configs import (
    FinetuningJobConfig,
    FinetuningRayConfig,
    LMHarnessJobConfig,
    LMHarnessEvaluationConfig,
)
from lm_buddy.integrations.huggingface import (
    AutoModelConfig,
    TextDatasetConfig,
    TrainerConfig,
    AdapterConfig,
)
from lm_buddy.integrations.wandb import WandbRunConfig

# Base model to finetune from HuggingFace
model_config = AutoModelConfig(load_from="distilgpt2")

# Text dataset for finetuning
dataset_config = TextDatasetConfig(
    load_from="imdb",
    split="train[:100]",
    text_field="text",
)

# HuggingFace trainer arguments
trainer_config = TrainerConfig(
    max_seq_length=256,
    per_device_train_batch_size=8,
    learning_rate=1e-4,
    num_train_epochs=1,
    logging_strategy="steps",
    logging_steps=1,
    save_strategy="epoch",
    save_steps=1,
)

# LORA adapter settings
adapter_config = AdapterConfig(
    peft_type="LORA",
    task_type="CAUSAL_LM",
    r=8,
    lora_alpha=16,
    lora_dropout=0.2,
)

# Define tracking for finetuning run
tracking_config = WandbRunConfig(
    name="example-finetuning",
    project="lm-buddy-examples",  # Update to your project name
    entity="mozilla-ai",  # Update to your entity name
)

# Ray train settings
ray_config = FinetuningRayConfig(
    use_gpu=False,  # Change to True if GPUs are available on your machine
    num_workers=2,
)

# Full finetuning config
finetuning_config = FinetuningJobConfig(
    model=model_config,
    dataset=dataset_config,
    trainer=trainer_config,
    adapter=adapter_config,
    tracking=tracking_config,
    ray=ray_config,
)

sfriedowitz · 2024-03-15T15:56:50Z

Where do we specify the cluster information in the new workflow?

Nothing is changing in how you specify the cluster information. The CLI of the package is not changed, so you can use the same commands as an entrypoint to a Ray job submission using their SDK.

src/lm_buddy/buddy.py

src/lm_buddy/cli/run.py

src/lm_buddy/buddy.py

src/lm_buddy/integrations/huggingface/asset_loader.py

src/lm_buddy/integrations/huggingface/tokenizer_config.py

src/lm_buddy/integrations/wandb/artifact_utils.py

src/lm_buddy/jobs/_entrypoints/lm_harness.py

src/lm_buddy/jobs/_entrypoints/prometheus.py

src/lm_buddy/jobs/_entrypoints/simple.py

src/lm_buddy/jobs/common.py

src/lm_buddy/paths.py

veekaybee · 2024-03-15T18:32:55Z

Tests and left some comments, unit tests pass and sample job works!

sfriedowitz · 2024-03-15T18:34:55Z

Thanks! Im a bit side tracked atm but will address most all of them in the next few hours.

…-runner

veekaybee

Thanks for adressing! LGTM

Sean Friedowitz added 7 commits March 13, 2024 17:08

refactoring path validatsion

bdc5ea7

working on job output

654a255

trying to set up job outputs

cbe8fb1

name back to load from

d5878c9

comments on prometheus fields

c96201b

refactoring

475e55c

wrote return types

e378983

sfriedowitz changed the base branch from main to dev/RD2024-147/buddy-class March 14, 2024 19:54

sfriedowitz commented Mar 14, 2024

View reviewed changes

src/lm_buddy/integrations/huggingface/dataset_config.py Show resolved Hide resolved

sfriedowitz commented Mar 14, 2024

View reviewed changes

src/lm_buddy/integrations/wandb/artifact_utils.py Show resolved Hide resolved

sfriedowitz commented Mar 14, 2024

View reviewed changes

src/lm_buddy/buddy.py Show resolved Hide resolved

Sean Friedowitz added 4 commits March 14, 2024 13:05

update docs and examples

b15b9e8

fix some broken tests

5a6acbf

fix remaining tests

1fa409d

docstrings

6a080cf

sfriedowitz changed the title ~~Buddy class for running jobs~~ Initial LMBuddy class for running jobs Mar 14, 2024

sfriedowitz commented Mar 14, 2024

View reviewed changes

examples/configs/evaluation/lm_harness_hf_config.yaml Show resolved Hide resolved

sfriedowitz commented Mar 14, 2024

View reviewed changes

src/lm_buddy/integrations/huggingface/asset_loader.py Outdated Show resolved Hide resolved

sfriedowitz commented Mar 14, 2024

View reviewed changes

src/lm_buddy/jobs/_entrypoints/lm_harness.py Show resolved Hide resolved

improve varaible names

9dd639d

sfriedowitz commented Mar 14, 2024

View reviewed changes

src/lm_buddy/buddy.py Outdated Show resolved Hide resolved

sfriedowitz marked this pull request as ready for review March 14, 2024 21:48

sfriedowitz requested review from veekaybee and aittalam March 14, 2024 21:48

add Path to match arm

6fcd5fe

var names in test

87193b9